Goto

Collaborating Authors

 network revenue management


Online Learning and Pricing for Network Revenue Management with Reusable Resources

Neural Information Processing Systems

We consider a price-based network revenue management problem with multiple products and multiple reusable resources. Each randomly arriving customer requests a product (service) that needs to occupy a sequence of reusable resources (servers). We adopt an incomplete information setting where the firm does not know the price-demand function for each product and the goal is to dynamically set prices of all products to maximize the total expected revenue of serving customers. We propose novel batched bandit learning algorithms for finding near-optimal pricing policies, and show that they admit a near-optimal cumulative regret bound of $\tilde{O}(J\sqrt{XT})$, where $J$, $X$, and $T$ are the numbers of products, candidate prices, and service periods, respectively. As part of our regret analysis, we develop the first finite-time mixing time analysis of an open network queueing system (i.e., the celebrated Jackson Network), which could be of independent interest. Our numerical studies show that the proposed approaches perform consistently well.


Continuum-armed Bandit Optimization with Batch Pairwise Comparison Oracles

Chang, Xiangyu, Chen, Xi, Wang, Yining, Zeng, Zhiyi

arXiv.org Machine Learning

This paper studies a bandit optimization problem where the goal is to maximize a function $f(x)$ over $T$ periods for some unknown strongly concave function $f$. We consider a new pairwise comparison oracle, where the decision-maker chooses a pair of actions $(x, x')$ for a consecutive number of periods and then obtains an estimate of $f(x)-f(x')$. We show that such a pairwise comparison oracle finds important applications to joint pricing and inventory replenishment problems and network revenue management. The challenge in this bandit optimization is twofold. First, the decision-maker not only needs to determine a pair of actions $(x, x')$ but also a stopping time $n$ (i.e., the number of queries based on $(x, x')$). Second, motivated by our inventory application, the estimate of the difference $f(x)-f(x')$ is biased, which is different from existing oracles in stochastic optimization literature. To address these challenges, we first introduce a discretization technique and local polynomial approximation to relate this problem to linear bandits. Then we developed a tournament successive elimination technique to localize the discretized cell and run an interactive batched version of LinUCB algorithm on cells. We establish regret bounds that are optimal up to poly-logarithmic factors. Furthermore, we apply our proposed algorithm and analytical framework to the two operations management problems and obtain results that improve state-of-the-art results in the existing literature.


Online Learning and Pricing for Network Revenue Management with Reusable Resources

Neural Information Processing Systems

We consider a price-based network revenue management problem with multiple products and multiple reusable resources. Each randomly arriving customer requests a product (service) that needs to occupy a sequence of reusable resources (servers). We adopt an incomplete information setting where the firm does not know the price-demand function for each product and the goal is to dynamically set prices of all products to maximize the total expected revenue of serving customers. We propose novel batched bandit learning algorithms for finding near-optimal pricing policies, and show that they admit a near-optimal cumulative regret bound of \tilde{O}(J\sqrt{XT}), where J, X, and T are the numbers of products, candidate prices, and service periods, respectively. As part of our regret analysis, we develop the first finite-time mixing time analysis of an open network queueing system (i.e., the celebrated Jackson Network), which could be of independent interest.


Infrequent Resolving Algorithm for Online Linear Programming

Li, Guokai, Wang, Zizhuo, Zhang, Jingwei

arXiv.org Artificial Intelligence

Online linear programming (OLP) has gained significant attention from both researchers and practitioners due to its extensive applications, such as online auction, network revenue management and advertising. Existing OLP algorithms fall into two categories: LP-based algorithms and LP-free algorithms. The former one typically guarantees better performance, even offering a constant regret, but requires solving a large number of LPs, which could be computationally expensive. In contrast, LP-free algorithm only requires first-order computations but induces a worse performance, lacking a constant regret bound. In this work, we bridge the gap between these two extremes by proposing an algorithm that achieves a constant regret while solving LPs only $O(\log\log T)$ times over the time horizon $T$. Moreover, when we are allowed to solve LPs only $M$ times, we propose an algorithm that can guarantee an $O\left(T^{(1/2+\epsilon)^{M-1}}\right)$ regret. Furthermore, when the arrival probabilities are known at the beginning, our algorithm can guarantee a constant regret by solving LPs $O(\log\log T)$ times, and an $O\left(T^{(1/2+\epsilon)^{M}}\right)$ regret by solving LPs only $M$ times. Numerical experiments are conducted to demonstrate the efficiency of the proposed algorithms.


A Re-solving Heuristic for Dynamic Assortment Optimization with Knapsack Constraints

Chen, Xi, Liu, Mo, Wang, Yining, Zhou, Yuan

arXiv.org Machine Learning

In this paper, we consider a multi-stage dynamic assortment optimization problem with multi-nomial choice modeling (MNL) under resource knapsack constraints. Given the current resource inventory levels, the retailer makes an assortment decision at each period, and the goal of the retailer is to maximize the total profit from purchases. With the exact optimal dynamic assortment solution being computationally intractable, a practical strategy is to adopt the re-solving technique that periodically re-optimizes deterministic linear programs (LP) arising from fluid approximation. However, the fractional structure of MNL makes the fluid approximation in assortment optimization highly non-linear, which brings new technical challenges. To address this challenge, we propose a new epoch-based re-solving algorithm that effectively transforms the denominator of the objective into the constraint. Theoretically, we prove that the regret (i.e., the gap between the resolving policy and the optimal objective of the fluid approximation) scales logarithmically with the length of time horizon and resource capacities.


Nearly Bounded Regret of Re-solving Heuristics in Price-based Revenue Management

Wang, Yining

arXiv.org Machine Learning

Price-based revenue management is a class of important questions in operations management. In its simplest form, a retailer sells a single product over $T$ consecutive time periods and is subject to constraints on the initial inventory levels. While the optimal pricing policy over $T$ periods could be obtained via dynamic programming, such an approach is sometimes undesirable because of its enormous computational costs. Approximately optimal policies, such as the re-solving heuristic, is often applied as a computationally tractable alternative. In this paper, we prove the following results: 1. We prove that a popular and commonly used re-solving heuristic attains an $O(\ln\ln T)$ regret compared to the value of the optimal DP pricing policy. This improves the $O(\ln T)$ regret upper bound established in the prior work of (Jasin 2014). 2. We prove that there is an $\Omega(\ln T)$ gap between the value of the optimal DP pricing policy and that of a static LP relaxation. This complements our upper bound results in showing that the static LP relaxation is not an adequate information-relaxed benchmark when analyzing price-based revenue management algorithms.